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Amendments to Claims 

This listing of claims will replace all prior versions, and listings, of claims in the application: 
Listin2 of Claims 

1 (currently amended): A method of extracting classifying data from an audio 
signal, the method comprising the steps of: 

(a) processing said audio signal into a perceptual representation of its constituent 
frequencies; 

(b) processing said perceptual representation into at least one learning representation 
of said audio data stream; 

(c) inputting at least one said learning representation into a mulli-stagc classiiier a _said 
multi-stage classifier comprising one or more first stage classifiers and a final stage metaleamer 
classifier, the first stage classifiers receiving the learning representations and generating 
metaleamer vector which is utilized hv the final stage metaleamer classifier to generate the 
classification of said audio signal, whereby said multi-stage classifier extracts classifying data 
from said learning representations and outputs the classification of said audio signal 

2 (original): The method of extracting classifying data from an audio signal according 
to claim 1 , wherein the step of processing the audio data into a perceptual representation of its 
constituent frequencies comprises calculating, for a time sample window of a digital 
representation of said audio signal, a Fast Fourier Transform function. 

3 (original): The method of extracting classifying data from an audio signal according 
to claim 1, wherein the step of processing said perceptual representation into at least one learning 
representation further comprises dividing said perceptual representation into a plurality of time 
slices. 
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4 (original): The method of extracting classifying data from an audio signal according 
to claim 3, wherein each of said time slices is about 0.8 to about 1 2 seconds in length. 

5 (original): The method of extracting classifying data from an audio signal according 
to claim 1 , wherein the step of dividing the perceptual representation into learning 
representations further comprises dividing said perceptual representation into a plurality of 
frequency bands. 

6 (original): The method of extracting classifying data from an audio signal according 
to claim 5, wherein said plurality of frequency bands comprises 20 frequency bands. 

7 (currently amended): The method of extracting classifying data from an audio 
signal according to claim 5, wherein the size of each of said frequency bands grows according to 
fee_a golden ratio of frequency with respect to pitch. 

8 (original): The method of extracting classifying data from an audio data stream 
according to claim 5, wherein no said frequency band includes any frequency greater than 1 1 
kHz. 

9 (currently amended): The method of extracting classifying data from an audio 
signal according to claim 1 4 wherein- a said first stag e classifier of said multi-stage classifier 
comprises at least one Support Vector Machine. 

10 (currently amended): The method of extracting classifying data from an audio 
signal according to claim 10, wherein said first stag e classifier of said multi-stage classifier 
comprises at least one Support Vector Machine per category of classification. 

1 1 (currently amended): The method of extracting classifying data from an audio 
signal according to claim 1, wherein-^said final stace metaleamer classifier of said multi-stage 
classifier comprises a neural network. 
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12 (original): The method of extracting classifying data from an audio signal according 
to claim 1 1 > wherein said neural network comprises at least one input node per category of 
classification, and further wherein said neural net comprises at least one output node per category 
of classification, 

13 (original): The method of extracting classifying data from an audio signal according 
to claim 1 2, wherein said neural network comprises a hidden layer, wherein said hidden layer 
comprises at least as many nodes as the number of said input nodes. 

14 (original): The method of extracting classifying data from an audio signal according 
to claim 1 1 , wherein said neural network operates on a Gaussian activation function- 

15 (original): The method of extracting classifying data from an audio signal according 
to claim 1, wherein said classifying data comprises at least one of artist and genre. 

16 (original): The method of extracting classifying data from an audio signal according 
to claim 1, further comprising the step of converting said audio signal into a pulse code 
modulated digital bitstream. 

17 (original): The method of extracting classifying data from an audio signal according 
to claim 1, further comprising the step of measuring the confidence of said classification by said 
multi-stage classifier. 

1 8 (currently amended): A computer readable storage medium, storing therein a 
program of instructions for causing a computer to execute pmcess of extracting classifying data 
from an audio signal, said process comprising the steps of: 

(a) processing said audio signal into a perceptual representation of its constituent 
frequencies; 
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(b) processing said perceptual representation into at least one learning representation; 

(c) inputting said learning representations of said audio data stream into a multi-stage 
classifi e r. said multi-stage classifier comprising one or more first stage classifiers and a final 
stage metaleamer classifier, the first stage classifiers receiving the learning representations and 
generating a metal earner vector which is utilized bv the final stage metalearn er classifier to 
generate the classification of said audio signal whereby said multi-stage classifier extracts 
classifying data from said learning representations and outputs the classification of said audio 
signal. 

. 19 (withdrawn): A method of representing an audio signal for machine learning 
comprising; 

(a) creating a perceptual representation of said audio signal by performing a 
frequency domain transform on at least one time-sampled window of a digital representation of 
said audio signal, said perceptual representation comprising component magnitudes of 
constituent frequency vectors that comprise said audio signal; 

(b) calculating a magnitude of each constituent frequency vector within said audio 

signal; 

. (c) grouping each of said constituent frequency vectors into a number of frequency 
bands; 

(d) calculating an average magnitude of said constituent frequency vectors within 
each of said frequency bands; and 

(e) arranging said magnitudes into a learning representation. 

20 (withdrawn): The method according to claim 19 wherein said frequency domain 
transform is a Fast Fourier Transform. 

21 (withdrawn): The method according to claim 19 wherein an average magnitude 
of said constituent frequency vectors within each of said frequency bands further comprises an 
aggregate average magnitude over a plurality of said time-sampled windows, 
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22 (withdrawn): The method according to claim 21 where said plurality of time- 
sampled windows comprises 12 time-sampled windows. 

23 (withdrawn): The method according to claim 1 9 wherein no said frequency band 
includes any frequency greater than 1 1 kHz. 

24 (withdrawn): The method according to claim 1 9 wherein said frequency bands 
grow in size according to the golden ratio of frequency with respect to pitch. 

25 (withdrawn): The method according to claim 19 further comprising the step of 
converting said audio signal into a pulse code modulated bitstream for processing by said 
frequency domain transform. 

26 (withdrawn): A computer readable storage medium, storing therein a program of 
instructions for causing a computer to execute process of representing an audio signal for 
machine learning, said process comprising the steps of: 

(a) creating a perceptual representation of $aid audio signal by performing a 
frequency domain transform on at least one time-sampled window of a digital representation of 
said audio signal, said perceptual representation comprising component magnitudes of 
constituent frequency vectors thai comprise said audio signal; 

(b) calculating a magnitude of each constituent frequency vector within said audio 

signal; 

(c) grouping each of said constituent frequency vectors into a number of frequency 

bands; 

(d) calculating an average magnitude of said constituent frequency vectors within 
each of said frequency bands; and 

(e) arranging said magnitudes into a learning representation. 

Page 6 of 12 

PAGE 8/15 ■ RCVD AT 7/1/2005 1 :39:03 PM [Eastern Dayfight Time] » SVR:USPTMFXRF-1/1 ■ DNIS:8729306 ■ CSD:160W5124$0 1 DURATION (mnvss]:04-06 


JUL.*0t'.05(FRI) 12:41 NEC LABORATORIES INC. 


TEL: 16099512480 


P. 


Appl.No. 09/939,954 

Amendment dated July I, 2005 

Reply to Non-Final Office Action of February 1 , 2005 

27 (currently amended): An apparatus for classifying an audio data stream 
comprising: 

(a) a means for covering an audio data stream into a perceptual representation of its 
constituent frequencies; 

(b) a means for dividing said perceptual representation into learning representations; 

and 

(c) a multi-stage classifying means trained to distinguish among classifying categories 
of said audio data stream, wherein said multi-stage classifying means further comprises one or 
more first stage classifying means and a final stage metalearner classifying means, said first stage 
classifying means receiving the teaming representations and generating a metalearner vector 
which is utilized by the final stage metalearner classifying means to generate the classification of 
said audio signal and outputs the classification of said audio signal, 

28 (original): The apparatus according to claim 27, wherein the said means for covering 
an audio data stream into a perceptual representation of its constituent frequencies comprises 
means to perform a Fast Fourier Transform function on at least one time-sampled window digital 
representation of said audio stream. 

29 (original): The apparatus according to claim 27, wherein a means for dividing said 
perceptual representation into learning representations further comprises means for dividing said 
perceptual representation into a plurality of time slices. 

30 (original): The apparatus according to claim 29, wherein each of said lime slices is 
about 0.8 to about 1 .2 seconds in length. 

31 (original): The apparatus according to claim 27, wherein said means for dividing said 
perceptual representation into learning representations further comprises means for dividing said 
perceptual representation into a plurality of frequency bands. 
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32 (original): The apparatus according to claim 3 1 , wherein said plurality of frequency 
bands comprises 20 frequency bands. 

33 (currently amended): The apparatus according to claim 3 1 , wherein the size of 
each of said frequency bands grows according to-fee_a golden ratio of frequency with respect to 
pitch. 

34 (original): The apparatus according to claim 31 , wherein no said frequency includes 
any frequency higher than 1 1 kHz, 

35 (currently amended): The apparatus according to claim 27, wherein-asaid first 
stage classifier of said multi-stage classifier comprises at least one Support Vector Machine. 

36 (currently amended): The apparatus according to claim 36, wherein said first 
stage classifier of said multi-stage classifier comprises at least one Support Vector Machine per 
category of classification. 

37 (currently amended); The apparatus according to claim 27, wherein-a_said final 
stag e metaleamcr classifier of said multi-stage classifier comprises a neural network. 

38 (original): The apparatus according to claim 37, wherein said neural network 
comprises at least one input node per category of classification, and further wherein said neural 
net comprises al least one output node per categoiy of classification, 

39 (original): The apparatus according to claim 38, wherein said neural network 
comprises a hidden layer, wherein said hidden layer comprises at least as many nodes as the 
number of said input nodes. 

40 (original): The apparatus according to claim 37, wherein said neural network operates 
on a Gaussian activation function. 
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41 (original); The apparatus according to claim 27, wherein said classifying categories 
comprise at least one of artist and genre. 

42 (original): The apparatus according to claim 27, further comprising a means to 
convert said audio signal into a pulse code modulated digital bitstream. 

43 (original): The apparatus according to claim 27 4 further comprising a means for 
measuring the confidence of said classification by said multi-stage classifier. 

44 (withdrawn): An apparatus for representing an audio signal for machine learning 
comprising: 

(a) a means for performing a frequency domain transform on at least one time- 
sampled window of a digital representation of said audio signal, said perceptual representation 
comprising component magnitudes of constituent frequency vectors that comprise said audio 
signal; 

(b) a means for calculating a magnitude of each constituent frequency vector; 

(c) a means for grouping each of said constituent frequency vectors into a number of 
frequency bands; 

(d) a means foT calculating an average magnitude of said constituent frequency 
vectors within each oTsaid frequency bands; and 

(e) a means for arranging said magnitudes into a learning representation. 

45 (withdrawn): The apparatus according to claim 44 wherein said means for 
performing a frequency domain transform comprises a means for performing a Fast Fourier 
Transform. 

46 (withdrawn): The apparatus according to claim 44 wherein no said frequency 
band includes any frequency greater than 1 1 kHz. 
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47 (withdrawn): The apparatus according to claim 44 wherein said frequency bands 
grow in size according to the golden ratio of frequency with respect to pitch. 

48 (withdrawn): The apparatus according to claim 44 further comprising a means 
for converting said audio signal into a pulse code modulated bitstream for processing by said 
frequency domain transform. 
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